UNIT — 3 XML (Extensible Markup Language) 
1. Introduction to XML? 


XML stands for Extensible Markup Language. It is a text-based markup language derived 
from Standard Generalized Markup Language (SGML). 


XML tags identify the data and are used to store and organize the data, rather than 
specifying how to display it like HTML tags, which are used to display the data. XML is 
not going to replace HTML in the near future, but it introduces new possibilities’by adopting 
many successful features of HTML. 


There are three important characteristics of XML that make it useful in a variety of systems 
and solutions — 


e XML is extensible — XML allows you to create your own)self-descriptive tags, or 
language, that suits your application. 


«e XML carries the data, does not present it— XML allows you to store the data 
irrespective of how it will be presented. 


e XML isa public standard — XML was developed by,an organization called the World 
Wide Web Consortium (W3C) and is available as an open standard. 


XML Usage 


A short list of XML usage says it all = 


e XML can work behind the scene to simplify the creation of HTML documents for large 
web sites. 


e XML can be used to exchange the information between organizations and systems. 
e XML can be used for.offloading and reloading of databases. 


e XML can be used tostore and arrange the data, which can customize your data handling 
needs. 


e XML caneasily be merged with style sheets to create almost any desired output. 


e Virtually, any type of data can be expressed as an XML document. 
What issMarkup? 


XML is a markup language that defines set of rules for encoding documents in a format that 
is both human-readable and machine-readable. So what exactly is a markup 
language ? Markup is information added to a document that enhances its meaning in certain 
ways, in that it identifies the parts and how they relate to each other. More specifically, a 
markup language is a set of symbols that can be placed in the text of a document to 
demarcate and label the parts of that document. 


Mohamedsohel Shaikh Unit 3 Page No. 1 


Following example shows how XML markup looks, when embedded in a piece of text — 


<message> 


<text>Hello, world!</text> 
</message> 





This snippet includes the markup symbols, or the tags such as <message>...</message> and 
<text>... </text>. The tags <message> and </message> mark the start and the end of the 
XML code fragment. The tags <text> and </text> surround the text Hello, world!. 


2. Explain XML Syntax, its declaration, tags and elements? 


The XML document can optionally have an XML declaration. It is written as follows — 





Where version is the XML version and encoding specifies the character encoding used in 
the document. 


UTE stands for UCS Transformation Format, and UCS itself means Universal Character 
Set. The number 8 or 16 refers to the number of bits used to represent a character. They are 
either 8(1 to 4 bytes) or 16(2 or 4 bytes). For the documents without encoding 
information, UTF-8 is set by default. 


Syntax Rules for XML Declaration 


e The XML declaration is case sensitive and must begin with "<?xml>" where "xml" is 
written in lower-case. 


¢ If document contains XML declaration, then it strictly needs to be the first statement of 
the XML document. 


e The XML declaration strictly needs be the first statement in the XML document. 
Tags and Elements 


XML elements can be defined as building blocks of an XML. Elements can behave as 
containers tœhold text, elements, attributes, media objects or all of these. 


Each XML document contains one or more elements, the scope of which are either 
delimited by start and end tags, or for empty elements, by an empty-element tag. 


Syntax 


Following is the syntax to write an XML element — 


<element-name attributel attribute2> 


....content 
</element-name> 
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where, 


e element-name is the name of the element. The name its case in the start and end tags 
must match. 


¢ attributel, attribute2 are attributes of the element separated by white spaces. An 
attribute defines a property of the element. It associates a name with a valueswhich is 
a string of characters. An attribute is written as — 


name = "value" 


wot 


name is followed by an = sign and a string value inside double(" ") or single(' ') quotes. 


An XML file is structured by several XML-elements, also called XML-nodes,or XML-tags. 
The names of XML-elements are enclosed in triangular brackets < > as shown below — 


<element> 


Syntax Rules for Tags and Elements 


Element Syntax — Each XML-element needs to be closed either with start or with end 
elements as shown below — 


<element>....</element> 


or in simple-cases, just this way — 


<element/> 


Nesting of Elements — An XMbL-element can contain multiple XML-elements as its 
children, but the children elements must not overlap. i.e., an end tag of an element must 
have the same name as thatof the most recent unmatched start tag. 


The Following example shows incorrect nested tags — 


<?xml version = "1.0"2> 
<contact-info> 


<company>Arham Infotech 
</contact-info> 
</company> 





The Following example shows correct nested tags — 


<?xml version = "1.0"?> 
<contact-info> 


<company>Arham Infotech</company> 
<contact-info> 
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3. Explain root element and case sensitivity in XML. 


Each XML document has exactly one single root element. It encloses all the 
other elements and is therefore the sole parent element to all the other elements. ROOT 
elements are also called document elements. In HTML, the root element is the 
<html> element. XML is case sensitive language. 


XML documents must contain one root element that is the parent of all other elements: 


<root> 
<child> 
<subchild>.....</subchild> 
</child> 
</root> 


In this example <note> is the root element: 


<?xml version="1.0" encoding="UTF-8"?> 
<note> 
<to>ABC</to> 
<from>XYZ</from> 
<heading>Reminder</heading> 
<body>Don't forget to call</body> 
</note> 


4. Write a note on XML Document. 


An XML document is a basic unit-of XML information composed of elements and other markup 
in an orderly package. An XML documentcan contains wide variety of data. For example, 
database of numbers, numbers»representing molecular structure or a mathematical equation. 


XML Document Example 


A simple document is‘shown in the following example - 


<?xml vensi T "1.0"? > 
<contact-—imfo> 
<name>Tanmay Patil</name> 


<company>TutorialsPoint</company> 
<phone>(011) 123-4567</phone> 
</ n a f o> 








The following image depicts the parts of XML document. 


<?xml version="1.0"?> | Document Prolog 


<contact-info> 
<name>Tanmay Patil</name> 
<company>TutorialsPoint</company> 
<phone>(011) 123-4567</phone> 
</contact-info> 


Document Elements 
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Document Prolog Section 


Document Prolog comes at the top of the document, before the root element. This section 
contains — 


e XML declaration 
e Document type declaration 


1) XML Declaration : 


XML declaration contains details that prepare an XML processor; to parse’ the XML 


document. It is optional, but when used, it must appear in, the first line of the XML 
document. 


Syntax 


Following syntax shows XML declaration — 


<?xml 
version = "version_number" 
encoding = "encoding_declaration" 


standalone = "standalone_status" 
> 





Each parameter consists of a parameter name, an equals sign (=), and parameter value inside 
a quote. Following table shows the above syntax in detail — 


Parameter Parameter_value Parameter_description 


version 1.0 Specifies the version of the XML standard 
used. 


encoding UTF-8, UTF-16, ISO- It defines the character encoding used in 
10646-UCS-2, ISO- the document. UTF-8 is the default 
10646-UCS-4, ISO- encoding used. 
8859-1 to ISO-8859-9, 
ISO-2022-JP, Shift_JIS, 
EUC-JP 
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standalone yes or no It informs the parser whether the document 
relies on the information from an external 
source, such as external document type 
definition (DTD), for its content. The 
default value is set to no. Setting it 
to yes tells the processor there are no 
external declarations required for parsing 
the document. 


XML Declaration Examples 


Following are few examples of XML declarations — 
XML declaration with no parameters — 
<?xml > 
XML declaration with version definition — 
<?xml version = ''1.0''> 
XML declaration with all parameters defined — 
<?xml version = "1.0" encoding = ''UTF-8" standalone = "no" ?> 
XML declaration with all parameters, defined in single quotes — 
<?xml version = '1.0' encoding = 'iso-8859-1' standalone = 'no' ?> 
Document Elements. Section 
Document Elements are the building blocks of XML. These divide the document into a 
hierarchy of sections, each serving a specific purpose. You can separate a document into 
multiple sections,so that they can be rendered differently, or used by a search engine. The 
elements cambe containers, with a combination of text and other elements. 


Syntax 


Following is the syntax to write an XML element — 


<element-name attribute! attribute2> 


....content 
</element-name> 





where, 
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«+ element-name is the name of the element. The name its case in the start and end tags 
must match. 


e attributel, attribute2 are attributes of the element separated by white spaces. An 
attribute defines a property of the element. It associates a name with a value, which is 
a string of characters. An attribute is written as — 


name = "value" 


name is followed by an = sign and a string value inside double(" ") or single( ') quotes. 
Empty Element 


An empty element (element with no content) has following syntax — 


<name attribute! attribute2.../> 


Following is an example of an XML document using various XML element — 


<?xml version = "1.0"?> 
<contact-info> 
<address category = "residence"> 
<name>MS Shaikh</name> 


<company>MyTutorials</company> 
<phone>(011) 123-4567</phone> 
</address> 





</contact-info> 


XML Elements Rules 


Following rules are required)to be followed for XML elements — 


e An element name can contain any alphanumeric characters. The only punctuation mark 
allowed in names are the hyphen (-), under-score (_) and period (.). 


e Names,are case sensitive. For example, Address, address, and ADDRESS are different 
names. 


e Start and end tags of an element must be identical. 


e An element, which is a container, can contain text or elements as seen in the above 
example. 


2.)Document Type Declaration: 


The XML Document Type Declaration, commonly known as DTD, is a way to describe 
XML language precisely. DTDs check vocabulary and validity of the structure of XML 
documents against grammatical rules of appropriate XML language. 
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An XML DTD can be either specified inside the document, or it can be kept in a separate 
document and then liked separately. 


Syntax 


Basic syntax of a DTD is as follows — 


<!DOCTYPE element DTD identifier 
[ 


declaration 1 
declaration2 





In the above syntax, 
¢ The DTD starts with <!DOCTYPE delimiter. 
e An element tells the parser to parse the document fromthe specified root element. 


«e DTD identifier is an identifier for the documenttypedefinition, which may be the path 
to a file on the system or URL to a file on the internet. If the DTD is pointing to external 
path, it is called External Subset. 


e The square brackets [ ] enclose anoptional list of entity declarations called Internal 
Subset. 
Internal DTD 


A DTD is referred to as an internal DTD if elements are declared within the XML files. To 
refer it as internal DTD, standalone attribute in XML declaration must be set to yes. This 
means, the declaration works independent of an external source. 


Syntax 
Following is the syntax of internal DTD — 
<!DOCTYPE root-element [element-declarations|> 


where root-element is the name of root element and element-declarations is where you 
declare the elements. 


Example 


Following is a simple example of internal DTD — 






<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?> 
<!DOCTYPE address [ 
<!ELEMENT address (name,company,phone)> 


Mohamedsohel Shaikh Unit 3 Page No. 8 


<!ELEMENT name (#PCDATA)> //PCDATA= Portable Character Set 
<!ELEMENT company (#PCDATA)> 
<!ELEMENT phone (#PCDATA)> 

|> 


<address> 
<name>Tanmay Patil</name> 


<company>TutorialsPoint</company> 
<phone>(011) 123-4567</phone> 
</address> 





Let us go through the above code — 
Start Declaration — Begin the XML declaration with the following statement. 
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?> 


DTD — Immediately after the XML header, the document type declaration follows, 
commonly referred to as the DOCTYPE — 


<!DOCTYPE address [ 


The DOCTYPE declaration has an exclamation mark (!) at the start of the element name. 
The DOCTYPE informs the parser that a DTD.is associated with this XML document. 


DTD Body — The DOCTYPE declaration is.followed by body of the DTD, where you 
declare elements, attributes, entities, and notations. 


<!ELEMENT address (name,company,phone)> 
<!ELEMENT name (#PCDATA)> 


<!ELEMENT company (#PCDATA)> 
<!ELEMENT phone_no (#RPCDATA)> 





Several elements are declared here that make up the vocabulary of the <name> document. 
<!ELEMENT name (#PCDATA)> defines the element name to be of type "#PCDATA". 
Here #PCDATA means parse-able text data. 


End Declaration — Finally, the declaration section of the DTD is closed using a closing 
bracket and a closing angle bracket (]>). This effectively ends the definition, and thereafter, 
the XML document follows immediately. 


Rules 


e The document type declaration must appear at the start of the document (preceded only 
by the XML header) — it is not permitted anywhere else within the document. 


e Similar to the DOCTYPE declaration, the element declarations must start with an 
exclamation mark. 
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e The Name in the document type declaration must match the element type of the root 
element. 


External DTD 


In external DTD elements are declared outside the XML file. They are accessed by 
specifying the system attributes which may be either the legal .dtd file or a valid URE. To 
refer it as external DTD, standalone attribute in the XML declaration must be set as no. 
This means, declaration includes information from the external source. 


Syntax 
Following is the syntax for external DTD — 
<!DOCTYPE root-element SYSTEM "file-name"> 
where file-name is the file with .dtd extension. 
Example 


The following example shows external DTD usage = 


<?xml version = "1.0" encoding = "UTF-8" standalone =""no" ?> 
<!DOCTYPE address SYSTEM "address.dtd"> 
<address> 


<name>Tanmay Patil</name> 

<company>TutorialsPoint</company> 

<phone>(011) 123-4567</phone> 
</address> 





The content of the DTD file address.dtd is as shown — 


<!ELEMENT address,(name,company,phone)> 
<!ELEMENT name(#P@DATA)> 


<!ELEMENT company, (#PCDATA)> 
<!ELEMENTsphone (#PCDATA)> 


Types 





You can refer to an external DTD by using either system identifiers or public identifiers. 
System Identifiers 


A system identifier enables you to specify the location of an external file containing DTD 
declarations. Syntax is as follows — 


<!DOCTYPE name SYSTEM "address.dtd" [...]> 


As you can see, it contains keyword SYSTEM and a URI reference pointing to the location 
of the document. 
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Public Identifiers 
Public identifiers provide a mechanism to locate DTD resources and is written as follows — 
<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN"> 


As you can see, it begins with keyword PUBLIC, followed by a specialized identifier. 
Public identifiers are used to identify an entry in a catalog. Public identifiers can follow any 
format, however, a commonly used format is called Formal Public Identifiers, or FPIs. 
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